PolyUHK: A Robust Information Extraction System for Web Personal Names
نویسندگان
چکیده
Personal information extraction is an important component of advanced information retrieval. There are two problems needed to be solved in this practical task: personal name ambiguity and extraction of personal information for a specific person. For personal name ambiguity, which is a very common phenomenon in the fast growing Web resource, we propose a robust system which extracts features with a totally unsupervised approach from resources beyond the given Web corpus. The experiments show that these broad features not only can improve performances, but also increase the robustness of a disambiguation system. For personal information extraction, a rule-based information extraction system is introduced, which is able to re-use current well-developed tools effectively and identify the properties of Web data. The experiments show that the system can achieve state-of-the-art performances, especially the high precision.
منابع مشابه
Spontaneous identification of individual nick name from web
A person is generally called by different names, it is difficult to identify a person from the web, person will be called by different names by different people for example, Michael Jackson is called as MJ and some call him ” king of pop” , so there will be not trouble-free in penetrating the names from the web . Accurate identification of name of a given person is useful in various web related...
متن کاملAutomatically Extracting Personal Name Aliases from the Web
An entity can be referred by multiple name aliases on the web. Extracting aliases of an entity is important for various tasks such as identification of relations among entities, automatic metadata extraction and entity disambiguation. To extract relations among entities properly, one must first identify those entities. Aliases of an entity are useful as metadata for that entity and can be used ...
متن کاملAutomatic Discovery of Lexical Patterns using Pattern Extraction Algorithm to Identify Personal Name Aliases with Entities
The personal name aliases are extremely significant in information retrieval to retrieve complete information about a personal name from the web, as some of the web pages of the person may also be referred by his or her alias name / nick name / real name. There is a rapid growth in people searching where the personal name aliases are concerned. We proposed a pattern generator which includes aut...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کامل